17 research outputs found

    A Formalization of The Natural Gradient Method for General Similarity Measures

    Full text link
    In optimization, the natural gradient method is well-known for likelihood maximization. The method uses the Kullback-Leibler divergence, corresponding infinitesimally to the Fisher-Rao metric, which is pulled back to the parameter space of a family of probability distributions. This way, gradients with respect to the parameters respect the Fisher-Rao geometry of the space of distributions, which might differ vastly from the standard Euclidean geometry of the parameter space, often leading to faster convergence. However, when minimizing an arbitrary similarity measure between distributions, it is generally unclear which metric to use. We provide a general framework that, given a similarity measure, derives a metric for the natural gradient. We then discuss connections between the natural gradient method and multiple other optimization techniques in the literature. Finally, we provide computations of the formal natural gradient to show overlap with well-known cases and to compute natural gradients in novel frameworks

    Customised ensemble methodologies for deep learning: boosted residual networks and related approaches

    Get PDF
    This paper introduces a family of new customised methodologies for ensembles, called Boosted Residual Networks (BRN), which builds a boosted ensemble of Residual Networks by growing the member network at each round of boosting. The proposed approach combines recent developements in Residual Networks - a method for creating very deep networks by including a shortcut layer between different groups of layers - with Deep Incremental Boosting, a methodology to train fast ensembles of networks of increasing depth through the use of boosting. Additionally, we explore a simpler variant of Boosted Residual Networks based on Bagging, called Bagged Residual Networks (BaRN). We then analyse how the recent developments in Ensemble distillation can improve our results.We demonstrate that the synergy of Residual Networks and Deep Incremental Boosting has better potential than simply boosting a Residual Network of fixed structure or using the equivalent Deep Incremental Boosting without the shortcut layers, by permitting the creation of models with better generalisation in significantly less time

    Fluctuation-Driven Neural Dynamics Reproduce Drosophila Locomotor Patterns.

    Get PDF
    The neural mechanisms determining the timing of even simple actions, such as when to walk or rest, are largely mysterious. One intriguing, but untested, hypothesis posits a role for ongoing activity fluctuations in neurons of central action selection circuits that drive animal behavior from moment to moment. To examine how fluctuating activity can contribute to action timing, we paired high-resolution measurements of freely walking Drosophila melanogaster with data-driven neural network modeling and dynamical systems analysis. We generated fluctuation-driven network models whose outputs-locomotor bouts-matched those measured from sensory-deprived Drosophila. From these models, we identified those that could also reproduce a second, unrelated dataset: the complex time-course of odor-evoked walking for genetically diverse Drosophila strains. Dynamical models that best reproduced both Drosophila basal and odor-evoked locomotor patterns exhibited specific characteristics. First, ongoing fluctuations were required. In a stochastic resonance-like manner, these fluctuations allowed neural activity to escape stable equilibria and to exceed a threshold for locomotion. Second, odor-induced shifts of equilibria in these models caused a depression in locomotor frequency following olfactory stimulation. Our models predict that activity fluctuations in action selection circuits cause behavioral output to more closely match sensory drive and may therefore enhance navigation in complex sensory environments. Together these data reveal how simple neural dynamics, when coupled with activity fluctuations, can give rise to complex patterns of animal behavior

    Online learning with adaptive local step sizes

    No full text
    Almeida et al. have recently proposed online algorithms for local step size adaptation in nonlinear systems trained by gradient descent. Here we develop an alternative to their approach by extending Sutton’s work on linear systems to the general, nonlinear case. The resulting algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical independence between successive training patterns, and do not require an arbitrary smoothing parameter. In our benchmark experiments, they consistently outperform other acceleration methods as well as stochastic gradient descent with fixed learning rate and momentum.
    corecore